A
number of barriers stand in the way of achieving high availability. For
example, a poor implementation of Exchange might be one where Exchange
is installed on improperly sized servers and installed without
following best practices. In this case it is possible to deploy an
Exchange messaging environment over a short time period. This is easy
to do quickly, but a lot of important details can be missed and
availability will no doubt suffer.
By contrast, in a
high-availability environment the messaging system deployment is well
designed. During the deployment project, organizational messaging requirements are researched. The current messaging environment
is examined for inadequacies and solutions are identified. Research
into how best to deploy Exchange may go on for an extended period while
consultants are brought in to help build a design. Vendors are also
brought in to discuss how their products will work and how they can
contribute to running a highly available system. Hardware is sized and
tested to meet both business and technical requirements, such as
service-level agreements (SLAs), recovery point objectives, and cost
considerations. Hardware will be considered that has the defined level
of fault-tolerant components such as redundant memory, drives, network
connections, cooling fans, power supplies, and so on.
A high-availability
environment will also incorporate a significant amount of design,
planning, and testing. A high-availability environment will often, but
not always, include additional features, such as failover clustering
and load balancing, which are designed to decrease downtime by enabling
rolling upgrades and allowing for a preplanned response to failures.
The messaging client software and its potential configurations can also
improve availability. For example, Outlook 2003 and later offers the
Exchange Cached Mode configuration that allows users to create new
messages, respond to existing mail in their Inboxes, and manage their
calendars (among many other tasks) even if the connection is lost to
the Exchange server. Cached
Exchange Mode allows users to continue working locally even though the
Exchange server might be down for a short time. When the connection to
the Exchange server is restored, any changes made will be synchronized.
In the end, all critical business systems must be analyzed to
understand the cost incurred when they are unavailable. If downtime has
a significant cost, the organization should take steps to minimize
downtime. This is particularly true if the cost of downtime is greater
than the cost of deploying a suitable highly available solution.
The opposite of
availability is downtime, both planned and unplanned. Planned downtime
is the result of scheduled events, such as maintenance. Unplanned
downtime is the result of unscheduled events. Events that cause
unplanned downtime can be minor, such as a faulty hardware driver or a
processor failure, or major, such as an earthquake, fire, or flood.
1. Measuring Availability
Availability is usually
expressed as the percentage of time that a service is available. As an
example, a requirement for 99.9 percent availability over a one-year
period of 24-hour days, 7 days a week allows for only 8.75 hours of
downtime, as shown in Table 1.
In complex environments, organizations specify availability targets for
each service. When dealing with an Exchange messaging environment,
availability goals may be tied to specific features such as Microsoft
Outlook Web App, Simple Mail Transfer Protocol (SMTP) message delivery,
and Outlook MAPI connectivity. These availability targets are then
turned into SLAs that hold the group operating the messaging system
accountable for meeting those targets. In some cases, if those targets
are not met, the salaries and bonuses of the employees and managers in
the responsible group can be affected. In some instances both planned
and unplanned downtime affect the overall availability target; in other
environments planned downtime is exempt from the availability target.
Because successfully achieving high availability includes update
management to mitigate potential downtime, some planned downtime is
required.
Table 1. Permitted Downtime for Specific Availability Targets
AVAILABILITY TARGET | PERMITTED DOWNTIME ANNUALLY |
---|
99 percent | 87 hours, 36 minutes |
99.9 percent | 8 hours, 46 minutes |
99.99 percent | 52 minutes, 34 seconds |
99.999 percent | 5 minutes, 15 seconds |
This bit of background should not detract from the great features provided to help achieve high
availability in Exchange 2010; rather, the purpose is to provide a
frame of reference as the Exchange-specific high-availability features
are discussed.
2. Exchange 2010 High-Availability Features
Exchange 2010 builds on the
solid foundation set by Exchange 2007 with regard to high availability.
Exchange 2007 introduced a number of new options for availability,
including cluster continuous replication (CCR), standby continuous replication (SCR), single copy cluster (SCC), and local continuous replication (LCR). Exchange 2010 introduces the Database
Availability Group (DAG), which combines the best functionality
available in Exchange 2007. A DAG is a group of up to 16 Exchange 2010
Mailbox servers that can each maintain up to 100 databases. A database
may have up to 16 copies of each database using continuous replication.
The DAG differs from Exchange Server 2007 SP1 in the following ways:
With CCR, there can
be only two highly available copies of the database within the cluster;
within the DAG there can be up to 16 copies of each database.
With
SCR, the activation process required administrative intervention;
within a DAG, failover between individual database copies can happen
automatically.
With
SCC, a single shared copy of the database consumes less storage but
provides no redundancy. Exchange Server 2010 has no configuration that
replaces this functionality, although some third-party solutions may be
able to provide similar functionality by using the Third Party
Replication API.
With
LCR, a single-server configuration allows two copies of a database to
reside on different storage connected to the same server. No
configuration in Exchange Server 2010 replaces this functionality.
Exchange 2010 provides
database-level failover within the DAG. A single database failure no
longer affects all mailbox databases on a server. Database failover
time has also been improved since Exchange 2007. The DAG also makes it
easier to implement site failover because now the DAG handles both
in-site and inter-site replication.
Exchange 2010 also has improved non-mailbox high availability. Transport servers now have a feature called shadow redundancy, which provides redundancy for in-transit messages.
Another improvement is online mailbox
moves. In previous versions of Exchange, mailboxes are moved offline
which requires users to disconnect their clients in order to complete
the move. Since this process impacts the users, these mailbox moves are
usually scheduled during maintenance windows. Only being able to move
mailboxes at night and on the weekends during a migration project does
not provide enough time to complete the migration. The online mailbox
move feature allows mailboxes to be moved between databases
asynchronously without taking the user offline. The users will be able
to maintain their connection and work while their e-mail is being moved
in the background. This reduces end-user downtime and allows mailbox
migrations to be performed during business hours. Online mailbox moves
help improve availability for end users. More information about Exchange 2010 high-availability planning can be found in the Planning for High Availability and Site Resilience topic at http://technet.microsoft.com/en-us/library/dd638104.aspx.